Introduced the enum OutputFormat to validate the output format type#19
Introduced the enum OutputFormat to validate the output format type#19
Conversation
There was a problem hiding this comment.
Pull request overview
This PR introduces type safety for output format handling by replacing string-based validation with a dedicated OutputFormat enum. This change reduces the risk of typos and provides better IDE support through autocomplete and type checking.
- Introduced new
OutputFormatenum with three supported formats: QUEPID, RRE, and MTEB - Updated all references from string literals to enum values across production and test code
- Refactored WriterFactory registry to use enum keys instead of strings
Reviewed changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| src/llm_search_quality_evaluation/shared/models/output_format.py | New enum class defining the three supported output formats (quepid, rre, mteb) |
| src/llm_search_quality_evaluation/shared/writers/writer_config.py | Updated output_format field type from Literal to OutputFormat enum; removed unused Literal import |
| src/llm_search_quality_evaluation/shared/writers/writer_factory.py | Updated OUTPUT_FORMAT_REGISTRY to use OutputFormat enum keys instead of string keys |
| src/llm_search_quality_evaluation/shared/writers/init.py | Added OutputFormat to module exports |
| src/llm_search_quality_evaluation/dataset_generator/config.py | Updated output_format field type to OutputFormat; refactored validation logic to use enum comparison |
| src/llm_search_quality_evaluation/dataset_generator/main.py | Updated MTEB format check to use enum comparison instead of string |
| src/llm_search_quality_evaluation/vector_search_doctor/approximate_search_evaluator/main.py | Updated WriterConfig instantiation to use OutputFormat.RRE |
| tests/llm_search_quality_evaluation/test_cross_plataform.py | Updated test to use OutputFormat.QUEPID enum value |
| tests/llm_search_quality_evaluation/shared/writers/test_quepid_writer.py | Updated fixture to use OutputFormat.QUEPID enum value |
| tests/llm_search_quality_evaluation/shared/writers/test_rre_writer.py | Updated fixture to use OutputFormat.RRE enum value |
| tests/llm_search_quality_evaluation/shared/writers/test_mteb_writer.py | Updated fixture to use OutputFormat.MTEB enum value |
| tests/llm_search_quality_evaluation/dataset_generator/test_main_autosave.py | Updated Config instantiation to use OutputFormat.QUEPID |
| tests/llm_search_quality_evaluation/dataset_generator/test_config_dataset_generator.py | Updated assertions to compare against enum values instead of strings |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| raise ValueError("id_field is required when output_format='rre'") | ||
| elif self.output_format == "rre" and not self.rre_query_placeholder: | ||
| elif not self.rre_query_placeholder: | ||
| raise ValueError("rre_query_placeholder is required when output_format='rre'") | ||
| elif self.output_format == "rre" and not self.rre_query_template and not self.query_template: | ||
| elif not self.rre_query_template and not self.query_template: | ||
| raise ValueError("At least one query template is required when output_format='rre'") |
There was a problem hiding this comment.
The error messages still reference the old string format 'rre' instead of the enum. Consider updating these error messages to be consistent with the enum-based approach. For example, change "output_format='rre'" to "output_format=OutputFormat.RRE" or simply "output_format is RRE".
| """Supported output formats for dataset generation.""" | ||
| QUEPID = "quepid" | ||
| RRE = "rre" |
There was a problem hiding this comment.
Consider adding docstrings to each enum member to describe what each output format is used for. For example:
- QUEPID: CSV format for Quepid evaluation tool
- RRE: Format for Rated Ranking Evaluator
- MTEB: Format for Massive Text Embedding Benchmark
This would improve code documentation and make it clearer for developers what each format represents.
| """Supported output formats for dataset generation.""" | |
| QUEPID = "quepid" | |
| RRE = "rre" | |
| """Supported output formats for dataset generation.""" | |
| # CSV format for Quepid evaluation tool | |
| QUEPID = "quepid" | |
| # Format for Rated Ranking Evaluator (RRE) | |
| RRE = "rre" | |
| # Format for Massive Text Embedding Benchmark (MTEB) |
Notes
The dataset generator can export the dataset to Quepid, RRE, and MTEB. The export output format is managed internally as a string, which could cause possible issues due to the lack of proper type checking.
Changes
OutputFormatquepid,rre,mtebwith the proper enum valueTests